A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon

نویسندگان

Frédérick Garcia

Seydina M. Ndiaye

چکیده

Many reinforcement learning algorithms, like Q-Learning or R-Learning, correspond to adaptative methods for solving Markovian decision problems in innnite-horizon when no model is available. In this article we consider the particular framework of non-stationary nite-horizon Markov Decision Processes. After establishing a relationship between the nite-horizon total reward criterion and the average-reward criterion in nite-horizon, we deene Q H-Learning and R H-Learning for nite-horizon MDPs. Then we introduce the Ordinary Diierential Equation (ODE) method to conduct a learning rate analysis of Q H-Learning and R H-Learning. R H-Learning appears to be a version of Q H-Learning with matrix-valued step-sizes, the corresponding gain matrix being very close to the optimal matrix which results from the ODE analysis. Experimental results connrm that performance hierarchy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement Learning with Time

This paper steps back from the standard infinite horizon formulation of reinforcement learning problems to consider the simpler case of finite horizon problems. Although finite horizon problems may be solved using infinite horizon learning algorithms by recasting the problem as an infinite horizon problem over a state space extended to include time, we show that such an application of infinite ...

متن کامل

Reinforcement Learning in Neural Networks: A Survey

In recent years, researches on reinforcement learning (RL) have focused on bridging the gap between adaptive optimal control and bio-inspired learning techniques. Neural network reinforcement learning (NNRL) is among the most popular algorithms in the RL framework. The advantage of using neural networks enables the RL to search for optimal policies more efficiently in several real-life applicat...

متن کامل

Reinforcement Learning in Neural Networks: A Survey

متن کامل

Multicast Routing in Wireless Sensor Networks: A Distributed Reinforcement Learning Approach

Wireless Sensor Networks (WSNs) are consist of independent distributed sensors with storing, processing, sensing and communication capabilities to monitor physical or environmental conditions. There are number of challenges in WSNs because of limitation of battery power, communications, computation and storage space. In the recent years, computational intelligence approaches such as evolutionar...

متن کامل

Nearly Optimal Exploration-Exploitation Decision Thresholds

While in general trading off exploration and exploitation in reinforcement learning is hard, under some formulations relatively simple solutions exist. Optimal decision thresholds for the multi-armed bandit problem, one for the infinite horizon discounted reward case and one for the finite horizon undiscounted reward case are derived, which make the link between the reward horizon, uncertainty ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon

نویسندگان

چکیده

منابع مشابه

Reinforcement Learning with Time

Reinforcement Learning in Neural Networks: A Survey

Reinforcement Learning in Neural Networks: A Survey

Multicast Routing in Wireless Sensor Networks: A Distributed Reinforcement Learning Approach

Nearly Optimal Exploration-Exploitation Decision Thresholds

عنوان ژورنال:

اشتراک گذاری